Part 2: Creating, Visualizing, and Analyzing the Hip-Hop Network

Importing necessary libraries

In [126]:
import json
import spotipy
import pandas as pd
import io
import re
import networkx as nx
import matplotlib.pyplot as plt
import collections
from fa2 import ForceAtlas2 
from networkx.readwrite import json_graph
import numpy as np
from matplotlib import colors
import operator
from community import community_louvain
import matplotlib.colors as pltcolors
In [109]:
df_artists = pd.read_csv(r'hiphopArtists_new.csv')
In [110]:
nameList = list(df_artists["artist_name"])
In [5]:
df_artists.tail(10)
Out[5]:
Unnamed: 0 artist_name artist_id
905 905 Junglepussy 6atGQM99IrRfUefJFore1B
906 906 Deem Spencer 4iXaGootDLE50qY42LCdnK
907 907 Denzel Curry 6fxyWrfmjcbj5d12gXeiNV
908 908 Havelock 3AIAml2wCQUDhWt0BffVbA
909 909 Alex Gough 1rNNeas60ogZm9uhv1VZOh
910 910 Maesu 2kk1THOr0gsQaAqOj02tbl
911 911 Larry June 1grN0519h2zYqpRtYbDZAl
912 912 Hoodboi 1521R3ksLyQyFeqdtaSZUZ
913 913 The Internet 7GN9PivdemQRKjDt4z5Zv8
914 914 Channel Tres 4cUkGQyhLFqKHBtL58HYVp
In [111]:
if 'songs_with_feats_new3.json':
    with open('songs_with_feats_new3.json', 'r') as f:
        songs = json.load(f)

In this step, we make a dictionary where every artist's name is a key, and the value is another dictionary, containing each artist he/she collaborated with as a key, and the number of songs they collaborated on

In [112]:
total_songs = []
artist_tracks_data = songs["artists"]
artist_collab_dict = {}
        
for artist_name, artist_tracks in artist_tracks_data.items():
    
    # The inner dictionary that is the value for the artist's name as the key
    collabs = {}
    track_lookup = set() 
    
    for track in artist_tracks["tracks"]:
        
        feats = track['feats']
        track_name = track['track_name']
        
        # Ensuring that the artist has collaborators and we haven't processed the same track twice
        if len(feats) > 0 and track_name not in track_lookup:
            track_lookup.add(track['track_name'])
            
            for feat_artist in feats:
                # Checking to see if the featured artist is in our dataframe 
                if feat_artist in nameList:
                    # add a new entry to the dictionary or increment the total collaborations
                    if feat_artist not in collabs:
                        collabs[feat_artist] = 1
                    else:
                        collabs[feat_artist] += 1
                        
    if len(collabs) > 0:
        artist_collab_dict[artist_name] = collabs
In [113]:
artist_collab_dict["Drake"]
Out[113]:
{'J. Cole': 1,
 'Rick Ross': 3,
 'Lil Wayne': 10,
 'Lloyd': 1,
 'Santigold': 1,
 'Omarion': 1,
 'Giggs': 2,
 'Jorja Smith': 1,
 'Travis Scott': 2,
 '2 Chainz': 2,
 'Young Thug': 2,
 'Kanye West': 1,
 'PARTYNEXTDOOR': 4,
 'WizKid': 1,
 'Future': 11,
 'Rihanna': 2,
 'Big Sean': 1,
 'Kendrick Lamar': 1,
 'Nicki Minaj': 3,
 'Alicia Keys': 1,
 'T.I.': 1}

Here, we are creating the network. First, we add every node, and give it a weight equal to its degree. We also add edges as follows: First, we iterate through every artist, and all of his collaborators, adding an edge between each artist and his collaborator.

If we add an edge between artist A and B, and already see there was an edge created between artists B and A, we update the weight of that edge to be the sum of number of songs from A featuring B, and the number of songs from B featuring A.

In [114]:
G_rap = nx.Graph()

# we create an dictionary of edges we have already added. Each entry is of the form (A, B) --> num_songs
# where A is the main artist, B is the collaborator, and num_songs = the number of songs they worked on
lookup_edges = {}

# adding nodes
for artist, collabs in artist_collab_dict.items():
    G_rap.add_node(artist, weight=len(collabs), collabs = set(collabs.keys()))
    
# adding edges
for artist, collabs in artist_collab_dict.items():
    for collab, num_songs in collabs.items():
        
        lookup_edges[(artist, collab)] = num_songs
        G_rap.add_edge(artist, collab, weight = num_songs)
        
        # add in collaborations in reverse direction
        if (collab, artist) in lookup_edges:
            num_songs_reverse = lookup_edges[(collab, artist)]
            G_rap.add_edge(artist, collab, weight = num_songs + num_songs_reverse)

Inspecting and Analyzing the network

In [20]:
print("Total number of nodes:" ,len(G_rap))
print("-----------")
print("Total number of links", G_rap.size())
print("-----------")
print("Density", nx.density(G_rap))
Total number of nodes: 720
-----------
Total number of links 4103
-----------
Density 0.015851491268737444

Getting the artists who have collaborated with others the most. Here is the list!

In [21]:
degree_sequence = sorted([d for n, d in G_rap.degree()], reverse=True)
rap_sorted = sorted(G_rap.degree, key=lambda x: x[1], reverse=True)
print("- Top 10 by degree -")
for i in range(0,10):
    print("#"+str(i+1)+" :")
    print("Rapper: ",rap_sorted[i][0])
    print("Total collaborators: ", rap_sorted[i][1])
    print('-----')
- Top 10 by degree -
#1 :
Rapper:  Snoop Dogg
Total collaborators:  87
-----
#2 :
Rapper:  Chris Brown
Total collaborators:  83
-----
#3 :
Rapper:  2 Chainz
Total collaborators:  79
-----
#4 :
Rapper:  Gucci Mane
Total collaborators:  76
-----
#5 :
Rapper:  Rick Ross
Total collaborators:  76
-----
#6 :
Rapper:  Young Thug
Total collaborators:  72
-----
#7 :
Rapper:  Future
Total collaborators:  71
-----
#8 :
Rapper:  Busta Rhymes
Total collaborators:  71
-----
#9 :
Rapper:  Lil Wayne
Total collaborators:  71
-----
#10 :
Rapper:  French Montana
Total collaborators:  69
-----

Degree Distributions

In [72]:
# Degree Distribution
degrees = [G_rap.degree(n) for n in G_rap.nodes()]
plt.figure(figsize = (10, 8))
plt.hist(degrees, bins = 20, edgecolor='black', )
plt.xlabel('Number of collaborators')
plt.ylabel('Count')
plt.title('Degree Histogram of Rappers Network')
plt.xticks(list(range(10, 90)[::10]))
plt.show()

Measures of Centrality

Here, we examine several measures of Centrality to see which rappers have the most well-connected and influential

Top Artists by Betweenness Centrality

In [83]:
# Betweenness Centrality
bt_ctrs = [(k, v) for k, v in nx.betweenness_centrality(G_rap).items()]
sorted(bt_ctrs, key=lambda x: x[1], reverse = True)[:20]
Out[83]:
[('Snoop Dogg', 0.0539072730070342),
 ('French Montana', 0.04443079126965523),
 ('Chip', 0.043826144856047376),
 ('Busta Rhymes', 0.04085951321764177),
 ('Chris Brown', 0.03942006350587971),
 ('Giggs', 0.038974791395502806),
 ('Rick Ross', 0.03706952602873264),
 ('Skepta', 0.03247284683548266),
 ('Wiley', 0.030164823999891913),
 ('Tory Lanez', 0.02940530094357589),
 ('Gucci Mane', 0.029186919232268616),
 ('Lil Baby', 0.028966796077123082),
 ('2 Chainz', 0.027802419674560052),
 ('Future', 0.027767416092211797),
 ('Stefflon Don', 0.027611995378990783),
 ('Young Thug', 0.026871523051060917),
 ('Estelle', 0.025190180341218744),
 ('Wiz Khalifa', 0.02418750837805091),
 ('Lil Yachty', 0.022083729493435834),
 ('Rich The Kid', 0.021341210487594107)]

Top Artists by Eigenvector Centrality

In [82]:
# Eigenvector Centrality
eig_ctrs = [(k, v) for k, v in nx.eigenvector_centrality(G_rap).items()]
sorted(eig_ctrs, key=lambda x: x[1], reverse = True)[:20]
Out[82]:
[('2 Chainz', 0.19251625821691187),
 ('Chris Brown', 0.1906224055289425),
 ('Rick Ross', 0.18035465858523805),
 ('Future', 0.17643733761861471),
 ('Lil Wayne', 0.17520973480840035),
 ('Gucci Mane', 0.17390190120567842),
 ('French Montana', 0.15762281562794814),
 ('Snoop Dogg', 0.15686917186189697),
 ('T.I.', 0.1556456670226622),
 ('Young Thug', 0.15425104835368575),
 ('Meek Mill', 0.15086644620204),
 ('DJ Drama', 0.1489532273928032),
 ('Wiz Khalifa', 0.147593716984187),
 ('The Game', 0.14509824870350885),
 ('Yo Gotti', 0.13610102309493408),
 ('Big Sean', 0.1344315435007818),
 ('Busta Rhymes', 0.12722459851724688),
 ('Tyga', 0.12456428133665573),
 ('Nicki Minaj', 0.12445208911640977),
 ('YG', 0.11840119479245816)]

Graphs of Degree vs Eigenvector and Betweenness Centrality

In [77]:
# Degree vs Betweenness Centrality
deg_ctr_dict = nx.degree_centrality(G_rap)
bt_ctr_dict = nx.betweenness_centrality(G_rap)
eig_ctr_dict = nx.eigenvector_centrality(G_rap)

x = [v for k, v in deg_ctr_dict.items()]
y = [bt_ctr_dict[k] for k in deg_ctr_dict.keys()]

plt.figure(figsize=(15,10))
plt.scatter(x, y, alpha = 0.5)
plt.title('Degree Centrality vs Betweenness Centrality')
plt.xlabel('Degree Centrality')
plt.ylabel('Betweenness Centrality')
plt.show()
In [78]:
# Degree vs Eigenvector Centrality
x = [v for k, v in deg_ctr_dict.items()]
y = [eig_ctr_dict[k] for k in deg_ctr_dict.keys()]


plt.figure(figsize=(15,10))
plt.scatter(x, y, alpha = 0.5)
plt.title('Degree Centrality vs Eigenvector Centrality')
plt.xlabel('Degree Centrality')
plt.ylabel('Eigenvector Centrality')
plt.show()

It's interesting to note how well Degree Centrality aligns much better with Eigenvector Centrality than Betweenness Centrality.

Eigenvector centrality measures a node's "influence" on a network, since it takes into account the degree of a node's neighbors in the metric (so a node with many high degree neighbors is given a high eigenvector centrality score).

Since the correlation is so linear, we can guess that a rapper's number of collaborators is a good measure of his influence. High degree rappers are collaborating with other high degree rappers, and similarly for low degree rappers. This makes sense -- as a high degree rapper is probably very well known in the industry and has the power to collaborate with other very popular artists. However, a low degree, up and coming rapper who doesn't have the same influence is likely to collaborate with someone also with low degree -- "within his league", so to speak

Clustering and Assortativity

In [39]:
nx.average_clustering(G_rap)
Out[39]:
0.18773331021381934
In [40]:
nx.degree_assortativity_coefficient(G_rap)
Out[40]:
0.1977133078594039

Our network doesn't seems to be just slightly assortative and minimally clustered.

Up and Coming Artists

Here, we take a look at artists who's neighbors have a the highest average degree. What this is likely to mean is that this artist has collaborated with very influential artists (as we see in the correlation between degree and eigenvector centrality. Looking at some of these artists connections, we can see that they've collaborated with the influential figures mentioned in the degree/eigenvector centrality measures.

In [96]:
# Average Neighbor Degree
sweg = nx.average_neighbor_degree(G_rap)
sorted(sweg.items(), key=operator.itemgetter(1), reverse = True)[:10]
#sorted(dict(sorted_x), key=lambda x: x[1], reverse = True)[:20]
Out[96]:
[('T.R.U.', 79.0),
 ('Nicole Bus', 76.0),
 ('euro', 71.0),
 ('Ayanis', 62.0),
 ('Yung Joc', 56.09090909090909),
 ('83 Babies', 56.0),
 ('FKA twigs', 55.0),
 ('O.T. Genasis', 51.11764705882353),
 ('Juvenile', 50.53333333333333),
 ('Dame D.O.L.L.A', 49.5)]
In [121]:
def get_collaborators(artist):
    print(artist)
    return [neighbor for neighbor in G_rap.neighbors(artist)]
        
        
print(get_collaborators("T.R.U."))
print(get_collaborators("Nicole Bus"))
print(get_collaborators("euro"))
print(get_collaborators("Juvenile"))
T.R.U.
['2 Chainz']
Nicole Bus
['Rick Ross']
euro
['Lil Wayne']
Juvenile
['Gucci Mane', 'Future', 'Yo Gotti', 'T.I.', 'The Game', 'Snoop Dogg', '50 Cent', 'Fat Joe', 'Ludacris', 'Lil Wayne', 'Chris Brown', 'Dame D.O.L.L.A', 'DJ Drama', 'Missy Elliott', 'Ying Yang Twins']

Cliques

A clique of artists is a set of artists such that each member in the clique has collaborated with every other member in the clique. Analyzing these can be of value, because they will show us what artists are working closely together, and why that is (same location, same style of rap .. etc) |

In [ ]:
# Investigation ongoing ...

Visualizing The Network

We start off by using ForceAtlas to visualize the structure of our network, without labels for now.

In [130]:
#TODO: Commenting and cleaning this section
In [116]:
giant = max([G_rap.subgraph(c) for c in nx.connected_components(G_rap)], key=len)
data = json_graph.node_link_data(giant)
In [117]:
forceatlas2 = ForceAtlas2(
                        # Behavior alternatives
                        outboundAttractionDistribution=False,  # Dissuade hubs
                        linLogMode=False,  # NOT IMPLEMENTED
                        adjustSizes=False,  # Prevent overlap (NOT IMPLEMENTED)
                        edgeWeightInfluence=1.5,

                        # Performance
                        jitterTolerance=1.0,  # Tolerance
                        barnesHutOptimize=True,
                        barnesHutTheta=1.2,
                        multiThreaded=False,  # NOT IMPLEMENTED

                        # Tuning
                        scalingRatio=0.5,
                        strongGravityMode=False,
                        gravity=1,

                        # Log
                        verbose=False)
        
positionsUN = forceatlas2.forceatlas2_networkx_layout(giant, pos=None, iterations=2000)
In [118]:
with open('positionsNetwork.json', 'w') as outfile:
    json.dump(positionsUN, outfile)
In [119]:
labelPos = {}
for el in positionsUN:
    labelPos[el] = (positionsUN[el][0],positionsUN[el][1]+2)
In [120]:
cmape = colors.LinearSegmentedColormap.from_list('custom blue', 
                                             [(0,    (0.3, 0.3, 0.3)),
                                              (1,    (0,0,0))], N=5)


fig= plt.figure(figsize=(60,60))
degrees = []


for i in giant:
    degrees.append(giant.degree[i]*6)
    
edges,weights = zip(*nx.get_edge_attributes(giant,'weight').items())

resWeights=[]
for w in weights:
    if(w<5):
        resWeights.append(0.1)
    elif(w<10):
        resWeights.append(0.3)
    elif(w<15):
        resWeights.append(0.5)
    elif(w<20):
        resWeights.append(0.8)
    elif(w<25):
        resWeights.append(1)
    
a= nx.draw_networkx_nodes(giant, positionsUN, node_size=degrees, with_labels=False, node_color="blue", alpha=0.9)
b= nx.draw_networkx_edges(giant, positionsUN, edges_list= edges,edge_color=weights,  edge_cmap=cmape, width=resWeights )
#c= nx.draw_networkx_labels(giant, labelPos,font_size=12)
plt.savefig('HipHop_US_Network_900.png')
//anaconda3/lib/python3.7/site-packages/networkx/drawing/nx_pylab.py:579: MatplotlibDeprecationWarning: 
The iterable function was deprecated in Matplotlib 3.1 and will be removed in 3.3. Use np.iterable instead.
  if not cb.iterable(width):
//anaconda3/lib/python3.7/site-packages/networkx/drawing/nx_pylab.py:585: MatplotlibDeprecationWarning: 
The iterable function was deprecated in Matplotlib 3.1 and will be removed in 3.3. Use np.iterable instead.
  and cb.iterable(edge_color) \
//anaconda3/lib/python3.7/site-packages/networkx/drawing/nx_pylab.py:595: MatplotlibDeprecationWarning: 
The iterable function was deprecated in Matplotlib 3.1 and will be removed in 3.3. Use np.iterable instead.
  for c in edge_color]):

Community Detection

Using the Louvain community detection algorithm, we'd like to see what communities exist in our network and extract some interesting insights. For example, whether rappers of the same cities are working together (east coat vs west coast) for example

In [133]:
edgesWeight = dict(giant.edges)
In [134]:
edgesWeightList = []

for i in edgesWeight:
    fromArtist = list(i)[0]
    toArtist = list(i)[1]
    weight = edgesWeight[i]['weight']
    edgesWeightList.append({"from": fromArtist, "to": toArtist, "weight": weight})
In [135]:
with open('nodesDegree.json', 'w') as outfile:
    json.dump(list(giant.degree), outfile)
In [136]:
partition = community_louvain.best_partition(giant)
communities = list(set(partition.values()))
colors = list(pltcolors._colors_full_map.values())[0:len(communities)]
cmap = dict(zip(communities, colors))
print("The algorithm has identifies %.0f communities" %len(communities))
The algorithm has identifies 10 communities

Here is a visualization of our network, colored by communities and also assigned labels

In [137]:
plt.figure(figsize = (100,100))
pos = positionsUN
count = 0.
for count, com in enumerate(communities):
    list_nodes = [nodes for nodes in partition.keys()
                                if partition[nodes] == com]
    nx.draw_networkx_nodes(giant, pos, list_nodes, node_size=degrees,
                                node_color = cmap.get(com), alpha= 1)

nx.draw_networkx_edges(giant,pos, alpha=0.06)
nx.draw_networkx_labels(giant, pos,font_size=12)
plt.show()
In [19]:
posX = [i[0] for i in positionsUN]
posY = [i[1] for i in positionsUN]